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Abstract — The work identifies the first lattice decoding solution 
that achieves, in the general outage-limited MIMO setting and 
in the high-rate and high-SNR limit, both a vanishing gap 
to the error-performance of the (DMT optimal) exact solution 
of preprocessed lattice decoding, as well as a computational 
complexity that is subexponential in the number of codeword 
bits. The proposed solution employs lattice reduction (LR)-aided 
regularized (lattice) sphere decoding and proper timeout policies. 
These performance and complexity guarantees hold for most 
MIMO scenarios, all reasonable fading statistics, all channel 
dimensions and all full-rate lattice codes. 

In sharp contrast to the above very manageable complexity, 
the complexity of other standard preprocessed lattice decoding 
solutions is revealed here to be extremely high. Specifically 
the work is first to quantify the complexity of these lattice 
(sphere) decoding solutions and to prove the surprising result 
that the complexity required to achieve a certain rate-reliability 
performance, is exponential in the lattice dimensionality and in 
the number of codeword bits, and it in fact matches, in common 
scenarios, the complexity of ML-based solutions. Through this 
sharp contrast, the work was able to, for the first time, rigorously 
demonstrate and quantify the pivotal role of lattice reduction as 
a special complexity reducing ingredient. 

Finally the work analytically refines transceiver DMT analysis 
which generally fails to address potentially massive gaps between 
theory and practice. Instead the adopted vanishing gap condition 
guarantees that the decoder's error curve is arbitrarily close, 
given a sufficiently high SNR, to the optimal error curve 
of exact solutions, which is a much stronger condition than 
DMT optimality which only guarantees an error gap that is 
subpolynomial in SNR, and can thus be unbounded and generally 
unacceptable for practical implementations. 



I. Introduction 

The work applies to the general setting of outage-limited 
MIMO communications, where MIMO techniques offer sig- 
nificant advantages in terms of increased throughput and 
reliability, although at a cost of a potentially much higher 
computational complexity for decoding at the receivers. This 
high complexity brings to the fore the need for efficient 
decoders that tradeoff error-performance with complexity in 
a better manner than computationally expensive decoders like 
the strictly optimal maximum-likelihood (ML) decoder. 
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Specifically in terms of ML-based decoding, the use of the 
brute-force ML decoder, introduces a complexity that scales 
exponentially with the number of codeword bits. If on the other 
hand, a small gap to the exact ML performance is acceptable, 
then different branch-and-bound algorithms such as the sphere 
decoder (SD) have been known to accept reduced computa- 
tional resources. Despite the reduced complexity of sphere 
decoding, recent work in HI has revealed that, to achieve 
a vanishing error-gap to optimal ML solutions, even such 
branch-and-bound algorithms generally require computational 
resources that, albeit significantly smaller than those required 
by a brute-force ML decoder, again grow exponentially in the 
rate and the dimensionality, and remain prohibitive for several 
MIMO scenai-ios. 

This high complexity required by ML-based decoding solu- 
tions, serves as further motivation for exploring other families 
of decoding methods. A natural alternative is lattice decoding 
obtained by simply removing the constellation boundaries of 
the ML-based search, an action that loosely speaking exploits 
a certain symmetry which in turn may yield faster implemen- 
tations. It is the case though that even with lattice decoding, 
the computational complexity can be prohibitive: finding the 
exact solution to the lattice decoding problem is generally an 
NP hard problem (cf. ||2l)- At the same time though, the other 
extreme of very early terminations of lattice decoding, such 
as linear solutions, have been known to achieve computational 
efficiency at the expense though of a very sizable, and often 
unbounded, gap to the exact solution of the lattice decoding 
problem. 

In this work we explore lattice decoding solutions that, in 
conjunction with terminating policies, strike the proper balance 
between this exponential complexity and exponential gap. 

A. System model 

We consider the general mxn point-to-point multiple-input 
multiple-output model given by 



y = VpHx + w 



(1) 



where x e W", y e M" and w e M" respectively denote 
the transmitted codewords, the received signal vectors, and 
the additive white Gaussian noise with unit variance, where 
the parameter p takes the role of the signal to noise ratio 
(SNR), and where the fading matrix H e jj'ix"' [^ assumed 
to be random, with elements drawn from arbitrary statistical 
distributions. We consider that one use of ([T]) corresponds to 
T uses of some underlying "physical" channel. We further as- 
sume the transmitted codewords x to be uniformly distributed 



over some codebook X G R"', to be statistically independent 
of the channel H, and to satisfy the power constraint 



i?{||xf}<T. 



(2) 



B. Rate, reliability and complexity in outage-limited MIMO 
communications 

In terms of error performance, we let P^ denote the proba- 
bility of codeword error, and we consider the rate. 



R^^\og\X\ 



(3) 



in bits per channel use (bpcu), where \X\ denotes the cardi- 
nality of X. 

Regarding complexity, we let A^max describe the compu- 
tational resources, in floating point operations (flops) per T 
channel uses, that the transceiver is endowed with, in the sense 
that after Ny^ax flops, the transceiver must simply terminate, 
potentially prematurely and before completion of its task. We 
note that naturally, A'max is intimately intertwined with the 
desired P^ and R, and that any attempt to significantly reduce 
^max may be at the expense of a substantial degradation in 
error-performance. 

In the high SNR regime, a given encoder AV and decoder 
Vr are said to achieve a multiplexing gain r (cf. ID) and 
diversity gain d{r) if 



..... ^M 

p-5-OO log p 



lira 



and 



logPe 



— lim 

p^oc log/9 



dir). (4) 



In the same high SNR regime, the complexity is here chosen 
to take the form 



c{r) 



lim 

p— >oo 



N„ 



logp ' 



(5) 



which is henceforth denoted as the complexity exponent. 
Noting that R = r\ogp, we observe that c{r) > implies 
a complexity that is exponential in the rate. 

Remark 1: A reasonable question at this point would per- 
tain as to why the computational resources iVmax scale with p 
and are dependent on r, to which we note that the complexity 
of decoding is generally dependent on the density of the 
codebook, which in turn depends on p and R. Furthermore 
this dependence of the complexity exponent (and by extension 
of A^max) on r, reflects a potential ability to regulate the 
computational resources depending on the rate. Finally the 
fact that both P^ and A^max are represented as polynomial 
functions of p, simply stems from the fact that both P^ and 
\X\ naturally scale as polynomial functions of p. Specifically 
we quickly note that c{r) captures the entire complexity range 

< c{r) < rT 

of all reasonable transceivers, with c{r) = corresponding 
to the fastest possible transceiver (requiring a subexponential 
number of flops per T channel uses), and with c{r) = rT 
corresponding to the optimal but arguably slowest, full-search 



uninterrupted ML decodejj in the presence of a canonical code 



with multiplexing gain r, i.e., with \Xr\ 
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If this canonical code though is linear, searching the entire 
codebook can be avoided by algorithmic solutions like the 
sphere decoder (SD) which can provide substantial complexity 
reductions at a potential small loss in error performance. Such 
solutions take advantage of the linear nature of the code that 
is defined by a generator matrix G and a shaping region TZ . 
Specifically for r > 0, a (sequence of) full-rate linear (lattice) 
code(s) Xr is given by Xr ~ ArOTZ where A^ = p~^^A and 
A = {Gs I s e Z''}, where Z** denotes the k = min{TO, n} 
dimensional integer lattice, where 7?. is a compact convex 
subset of M.^ that is independent of p, and where G G M™'^'' 
is full rank and independent of p. For the class of lattice codes 
considered here, the codewords take the form 



■Gs, s e §,'! = z'^ n p'^7^, 



(6) 



where 7?. C K** is a natural bijection of the shaping region 
TZ that preserves the code, and where TZ contains the all zero 
vector 0. 

As noted before, despite the reduced complexity of sphere 
decoding of such lattice codes (as compared to brute-force 
ML decoding), recent work in [I] has revealed that even such 
branch-and-bound algorithms generally require computational 
resources that grow exponentially in the number of codeword 
bits and the dimensionality. As an indicative example of this 
high complexity, we note that the work in HI showed that 
such SD algorithms, when applied for decoding a large family 
of high-performing codes including all known full-rate DMT 
optimal codes, over the tit x tir quasi-static MIMO channel 
with Rayleigh fading and n^^ > n^, introduce a complexity 
exponent of the form 



c{r) 



T 



{rim - \r\ - 1) + (j^T \r\ - r(nT - 1))+). 



(7) 



In the above, \r\ denotes the largest integer not greater than 
r. The exponent, which simplifies to c{r) ~ —r{nT^ — r) 
for integer values of r, reaches at r = rix/S (for even 
values of nx) an overall maximum value of n^T/A which, 
for the aforementioned codes is equal to k/8, corresponding 
to complexity in the order of 2^'''°s'' = p*^/^ = y/\X\. 
At any fixed multiplexing gain, these required computational 
resources can be seen to be in the order of 2 "t ' flops 
which reveals a complexity that is exponential in the number 
of codeword bits, and a corresponding exponential slope of 



' We here note that strictly speaking, Xr , T>r may potentially introduce a 
complexity exponent larger than rT. In such a case though, Xr , Vr may 
be substituted by a lookup table implementation of Xr and an uni'estricted 
ML decoder. This encoder-decoder will jointly require resources that are a 
constant multiple of |A'r| = p^'^ as it has to construct and visit all possible 
\Xr\ codewords, at a computational cost of a bounded number of flops per 
codeword visit. It is noted that the number of flops per visited codeword is 
naturally independent of p. 

^Although premature at this point, we hasten to note for the expert reader 
that this complexity indeed holds irrespective of the radius updating policy, 
in'espective of the decoding ordering, and as we will see later on, holds even 
in the presence of MMSE preprocessing. 



C. Transition to lattice decoding for reducing complexity 

As mentioned, this high complexity of ML based (con- 
strained) decoders, motivates consideration of other decoder 
famihes, with a natural alternative being the unconstrained 
(naive) lattice decoder which takes the general form 



XL 



arg mm ||y 
xeAr 



VpHxIl 



(8) 



Naturally when x^ ^ X,., the decoder declares an error 

The use of lattice decoding, and specifically of preprocessed 
lattice decoding in MIMO communications has received sub- 
stantial attention from works like IH, fS] and ||6|, where the 
latter proved that lattice decoding in the presence of MMSE 
preprocessing achieves the optimal DMT for specific MIMO 
channels and statistics, and for DMT-optimal random codes. 
The use of lattice decoding as an alternative to computationally 
expensive ML based solutions, was recently further validated 
on the one hand by the aforementioned work in HI, Q which 
revealed the large computational disadvantages of ML based 
solutions, and on the other hand by the work in |[8| which 
further confirmed the performance advantages of lattice decod- 
ing by showing that regularized (MMSE-preprocessedjj lattice 
decoding achieves the optimal DMT performance, for almost 
all MIMO scenarios and fading statistics, and all non-random 
lattice codes, irrespective of the codes' ML performance. 

It is the case though that the aforementioned extreme 
complexity of exact lattice decoding solutions, in conjunction 
with the potentially unbounded error-performance degradation 
(gap) of very early terminations (as opposed to exact imple- 
mentations) of lattice decoding, bring to the fore the need for 
balanced approximations of lattice decoding solutions that bet- 
ter balance the very sizable complexity and gap. Specifically 
for any simplified variant Vr of the baseline (exact) MMSE- 
preprocessed lattice decoder, this gap can, in the high SNR 
regime, be quantified as 



gLic) = lim 



P. 



p^oo P (x 7^ x) 



(9) 



where P (x ^ x) describes the probability of error of the exact 
MMSE-preprocessed lattice decoder, where P^ denotes the 
probability of error of Vr, and where c (i.e., c{r)) is the 
complexity exponent that describes the (asymptotic rate of 
increase of the) computational resources required to achieve 
this performance gap. Generally a smaller computational com- 
plexity exponent c implies a larger gap ghic)- The clear 
task has remained for some time to construct decoders that 
optimally traverse this tradeoff between g and c, i.e., that 
reduce the performance gap to the exact lattice decoding solu- 
tion, with reasonable computational complexity. Equivalently 
for A^maxCff) denoting the computational resources in flops 
required to achieve a certain gap g to the baseline exact 

^We will interchangeably use MMSE-preprocessed decoder and regularized 
decoder, with the first term being more commonly used, and with the second 
implying a more general family of decoders (cf. where the equivalence 
between the two decoders is discussed.). Even though in the asymptotic 
setting of interest, the two accept the same results throughout the paper, some 
extra error-performance gains can be achieved by proper optimization of the 
regulaiized decoder (cf. (9))- 



MMSE-preprocessed lattice decoder, the above task can be 
described, in the high SNR regime, as trying to minimize 

j.^ l0giVinax(.g) 
p-^ca log/9 

This will be achieved later on. 

D. Contributions 

We first show that the computational complexity required 
by the MMSE-preprocessed (unconstrained) lattice sphere 
decoder, asymptotically matches the complexity of the (con- 
strained) ML-based (MMSE-preprocessed or not) sphere de- 
coders, and is commonly exponential in the dimensionality 
and the number of codeword bits. This is established for a 
large class of codes of arbitrary error-performance, a large 
class of fading statistics, and specifically for the quasi-static 
MIMO channel - for example the complexity required for 
DMT optimal lattice sphere decoding, in the presence of a 
large family of DMT optimal codes, takes the previously 
seen simple piecewise linear form in Q. In a parenthetical 
note, and deviating slightly from the spirit of this paper, we 
also provide a universal upper bound on the complexity of 
regularized lattice sphere decoding, which holds irrespective of 
the lattice code applied and irrespective of the fading statistics. 
This upper bound again takes the form in (|7]i, matching that 
in the case of constrained ML-based sphere decoding, thus 
revealing the surprising fact that there exists no statistical 
channel behavior that will allow the removal of the bounding 
region to cause unbounded increases in the complexity of the 
decodeo 

With provable evidence of the very high complexity of 
regularized lattice decoding, we turn to the powerful tool 
of lattice reduction and seek to understand its effects on 
computational complexity. While there has existed a general 
agreement in the community that lattice reduction does reduce 
complexity, cf. ifTOl . this has not yet been supported analyti- 
cally in any relevant communication settings. In fact, and quite 
opposite to common wisdom, it was recently shown that for 
a fixed-radiu^ sphere decoding implementation of the naive 
lattice decoder ifTTl . LR does not improve the sphere decoder 
complexity tail exponent. 

What our present work shows is that lattice reduction 
reduces an ML-like exponentially increasing complexity, to 
very manageable subexponential values. We specifically pro- 
ceed to prove that the LR-aided regularized lattice decoder, 
implemented by a fixed-radius sphere decoder and timeout 
policies that occasionally abort decoding and declare an error, 
achieves 



9L{e) 



1, lim 

p— J-oo 



logiVmax(ff) 

logp 



Ve>0,.g>l, 



i.e., achieves a vanishing gap to the exact implementation of 
regularized lattice decoding and does so with a complexity 

''in other words, this complexity bound holds even if the channel statistics 
are such that the channel realizations cause the decoder to always have to 
solve the hardest possible lattice search problem. 

^The radius here is considered fixed in the sense that it does not vary with 
respect to the channel realization and rate. 



exponent that vanishes to zero, which in turn impHes subex- 
ponential complexity in the sense that the complexity scales 
slower than any conceivable exponential function. It is finally 
noted that this vanishing gap approach serves the practical 
purpose of an analytical refinement over basic diversity anal- 
ysis which generally fails to address potentially massive gaps 
between theory and practice. 

E. Notation 

We use = to denote the exponential equality, i.e., we write 
f{p) = p^ to denote lim °f^^P' = B, and <, > are 

p->oo log/9 

similarly defined. With this notation, we can write P^ == p^"^^^^ 
(cf. ^). In this paper we use ^•~' to denote the smallest integer 
not smaller than the argument, l» j to denote the largest integer 
not larger than the argument, (•)^ to denote the conjugate 
transpose of (•), (•)+ to denote max{0, (•)} and vec{») to 
denote the operation whereby the columns of the argument (•) 
are stacked to form a vector 

II. MMSE-Preprocessed Lattice Sphere Decoding 
Complexity 

We proceed to describe the preprocessed lattice decoder, 
its sphere decoding implementation, and for a practical set- 
ting of interest that includes the quasi-static MIMO channel 
and common codes, to establish the decoder's computational 
complexity. 

A. Lattice sphere decoding 

Combining dTJ and Q yields the equivalent model 

y = MrS + yv (10) 

where 

M^ =:/95-^HGeM"^« (11) 

is a function of the multiplexing gaiqj r. 

Consequently the corresponding naive lattice decoder in ^ 
takes the form (see for example |8|, also ifTOll ) 

|2 



sl = arg mill ||y — Ms|| 



(12) 



As a result though of neglecting the boundary region, the 
above decoder declares additional errors if s^ ^ §J!, resulting 
in possible performance costs. These costs motivated the 
use of MMSE preprocessing which essentially regularizes 
the decision metric to penalize vectors outside the boundary 
constraint §Jf (cf. j8]). Specifically the MMSE-preprocessed 
lattice decoder is obtained by implementing an unconstrained 
search over the MMSE-preprocessed lattice, and takes the 
form 



Sr-ld 



arg min |jFy 



Rsll'. 



(13) 



where F and R are respectively the MMSE forward and 
feedback filters such that F = R~^M^, where 



R"R = M"M 



Hf 



(14) 



where ar ~ p^^ and where R is an upper-triangular matrix 
(more details can be found in Appendix iDl). For r = Fy, the 
model transitions from (fTol i to 



r = 



R-^M^Ms- 



R-^M^w 



R 

Rs 

Rs 



-H 



(R^R - a^I)s + R^-^M^w 



a/R^-^s + R-^M^w 



H-xifH, 



where 



a;R""s + R-^M 



H-\/iH, 



(15) 



(16) 



is the equivalent noise that includes self-interference (first 
summand) and colored Gaussian noise. Consequently the 
corresponding regularized lattice decoder takes the form 

2 



^r-ld 



arg mm r 

seZ" 



Rill 



(17) 



which is then solved by the sphere decoder which recursively 
enumerates all lattice vectors s G Z'' within a given sphere of 
radius ^ > 0, i.e., which identifies as candidates the vectors § 
that satisfy 



R-§lr <r 



(18) 



The algorithm specifically uses the upper-triangular nature 
of R to recursively identify partial symbol vectors s^, k = 
1 , • • • ,K, for which 



|rfe-RfcSfci|2<^^ 



(19) 



where Sj. and rj. respectively denote the last k components of 
s and r, and where R^ denotes the fc x fc lower-right submatrix 
of R. Clearly any set of vectors § e "L^, with common last 
k components that fail to satisfy ( fT9] l. may be excluded from 
the set of candidate vectors that satisfy (fTsT l. 

The enumeration of partial symbol vectors Sfc is equivalent 
to the traversal of a regular tree with k, layers - one layer per 
symbol component of the symbol vectors, such that layer k 
corresponds to the fcth component of the transmitted symbol 
vectoiQ s. There is a one-to-one correspondence between the 
nodes at layer k and the partial vectors s^,. We say that 
a node is visited by the sphere decoder if and only if the 
corresponding partial vector i^ satisfies ( fT9] l. i.e., there is a 
bijection between the visited nodes at layer k and the set 



A4 = {ifceZ^ I ||rfe-RfcSfe||2<^2|^ 



(20) 



B. Complexity of MMSE-preprocessed lattice sphere decoding 

Consequently the total number of visited nodes (in all layers 
of the tree) is given by 



NsD=Y.Nk, 



(21) 



fe=i 



For simplicity of notation we will, in most cases, denote Mr with M. 



where Nk = \Nk \ is the number of visited nodes at layer k of 
the search tree. The total number of visited nodes is commonly 

'We will henceforth refer to the symbol vector s G §JJ corresponding to 

-rT __ 

the transmitted codeword x = p « Gs (cf (6)) , simply as the transmitted 
symbol vector. 



taken as a measure of the sphere decoder complexity. It is 
easy to show that in the scale of interest the SD complexity 
exponent c{r) would not change if instead of considering the 
number of visited nodes, we considered the number of flops 
spent by the decodeo 

Naturally the total number of visited nodes is a function 
of the search radius ^. We here use a fixed radius, which 
may result in a non-zero probability that the transmitted 
symbol vector s is not in A/'k- Consequently we must choose a 
radius that strikes the proper balance between decreasing the 
aforementioned probability and at the same time sufficiently 
decreasing the size of JV^- Towards this we note that for the 
transmitted symbol vector s, the metric in (fTTI i satisfies 

||r-Rs||2 = ||w'|p, 

which means that if ||w || > ^, then the transmitted symbol 
vector is excluded from the search, resulting in a decoding 
error As Lemma |2] will later argue taking into consideration 
the self-interference and non-Gaussianity of w , we can set 

^ = ^/zlogp, for some z > d{r) such that 

p(iiw'f >e') <p-'''"', 

which implies a vanishing probability of excluding the trans- 
mitted information vector from the search, and a vanishing 
degradation of error performance. 

We here note that the MMSE-preprocessed lattice sphere 
decoder differs from its ML-based equivalent in two aspects: 
the presence of MMSE preprocessing and the absence of a 
bounding region to constrain the search. These two aspects 
are generally perceived to have an opposite effect on the 
complexity. On the one hand, MMSE preprocessing, which 
we recall from (l20b to introduce unpruned sets 



A4 = {h e 



|r^-Rfeifcf <e'}, fc = l, 



is associated to reduced complexity in lattice-based SD solu- 
tions (cf. ifTTI ) due to the resulting penalization of faraway 
lattice points (cf. IHl). On the other hand, the absence of 
boundary constraints can be associated to increased complex- 
ity as it introduces an unbounded number of candidate vectors. 
We proceed to show that in terms of the complexity exponent, 
under common MIMO scenarios and codes, these two aspects 
exactly cancel each other out, and that consequently MMSE- 
preprocessed lattice sphere decoding introduces a complexity 
exponent that matches that of ML-based sphere decoding 
(cf. HI), which it self is shown here to also match the 
complexity exponent of ML-based SD in the presence of 
MMSE preprocessing^. 

Before proceeding we note that this analysis is specific to 
sphere decoding, and that it does not account for any other 
ML based solutions that could, under some (arguably rare) 
circumstances, be more efficient. A classical example of such 
rare circumstances would be a MIMO scenario, or equivalently 

*To see this, we consider tliat tlie cost of visiting a node, is independent 
of p. Once at a visited node, this same bounded cost includes the cost of 
estabUshing which children-nodes not to visit in the next layer. 

We clarify that ML-based SD in the presence of MMSE preprocessing, 
corresponds to unpruned sets A/fc D Sjf where Sj; is the fc-dimensional set 
resulting from the natural reduction of SJ^ from (6). 



a set of fade statistics, that always generate diagonal channel 
matrices. Another example would be having codes drawn from 
orthogonal designs which introduce very small decoding com- 
plexity, but which are provably shown to be highly suboptimal 
except for very few unique cases like the tit = 2,ri,R = 1 
quasi-static case lfT2l . In light of this, in this section only, we 
mainly focus on the widely considered ht x nji (nn > ut) 
i.i.d. and quasi-static MIMO setting and on the large but 
specific family of full-rate (k ~ 2min{nT,?^R}^ = 271x7) 
threaded codes (cf. lfT3l - lfT6l ). which includes all known DMT 
optimal codes as well as uncoded transmission (V-BLAST). 

We proceed with the main Theorem of the section, which 
applies under natural detection ordering (cf. lH], lIsV), and 
under the assumption of i.i.d. regular fading statistico. 

Theorem 1: The complexity exponent for MMSE- 
preprocessed lattice sphere decoding any full-rate threaded 
code over the quasi-static MIMO channel with i.i.d. regular 
fading statistics, is equal to the complexity exponent of 
ML-based SD with or without MMSE preprocessing. 

Proof: See Appendix lAl ■ 

We clarify that even though all three decoders are DMT 
optimal, the above result incorporates more than just DMT 
optimal decoding, in the sense that any timeout policy will 
tradeoff d{r) with c{r) identically for ML-based and lattice- 
based sphere decoding. In other words the three decoders share 
the same d{r) and c{r) capabilities, irrespective of the timeout 
policy. 

Furthermore, considering different SD detection orderings 
(cf. lO), the following extends the range of codes for which 
the ML-based and lattice-based SD share a similar complexity. 
The proof follows from the proof of Theorem[T]in AppendixlAl 
and from Theorem 4 in H]. 

Corollary la: Given any full-rate code of arbitrary DMT 
performance, there is always at least one non-random fixed 
permutation of the columns of G, for which the complexity 
exponent of the MMSE-preprocessed lattice sphere decoder 
matches that of the ML based sphere decoder 

The following focuses on a specific example of practical 
interest. 

Corollary lb: The complexity exponent for DMT optimal 
MMSE-preprocessed lattice sphere decoding of minimum de- 
lay [T = ut) DMT optimal threaded codes over the quasi- 
static MIMO channel with i.i.d. regular fading statistics, takes 
the following form 



Cr-id{r) = r{nT - [rj - 1) + {ut lr\ - r{nT - 1))+, 



which simplifies to 

Cr-id{r) = r{nT - r) 
for integer values of r. 



(22) 



(23) 



'"The i.i.d. regular fading statistics satisfy the general set of conditions as 
described in |17| , where a) the near-zero behavior of the fading coefficients h 
is bounded in probability as ci|/i|* < p{h) < C2|/i|* for some positive and 
finite ci , C2 and t, where b) the tail behavior of h is bounded in probability 
as p{h) < C2e~"'^' for some positive and finite C2, b and /3, and where c) 
p{h) is upper bounded by a constant K. 



Proof: See Appendix IBJ ■ 

Further evidence that connects the complexity behavior of 
MMSE-preprocessed lattice-based SD, with that of its ML- 
based counterpart, now comes in the form of a non-trivial 
universal bound that is shared by the two methods. This is par- 
ticularly relevant because unconstrained lattice decoding could 
conceivably require unbounded computational resources given 
the unbounded number of candidate lattice points. Specifically 
the following universal upper bound on the complexity of 
regularized lattice-based SD, matches the upper bound in H] 
for the ML case, and it holds irrespective of the full-rate 
lattice code applied and irrespective of the fading statistics. 
The generality with respect to the fading statistics is important 
because it guarantees that no set of fading statistics, even those 
that always generate infinitely dense lattices, can cause an 
unbounded increase in the complexity due to removal of the 
boundary constraints. 

Corollary Ic: Irrespective of the fading statistics and of 
the full-rate lattice code applied, the complexity exponents 
of MMSE-preprocessed lattice SD and of ML-based SD, are 
upper bounded by 



T 

c(r) — — (ririT 

riT 



[r\ - 1) + {UT \r\ ~ r{nT - 1))+) 



which simplifies to 



T 

c(r) = — r(nT 



(24) 



(25) 



for integer r. 

Proof: See Appendix iB] ■ 

The above results revealed the very high, ML-like complex- 
ity of MMSE-preprocessed lattice decoding. Coming back to 
the main focus of the paper, and after reverting to the most 
general setting of MIMO scenarios, statistics and full-rate 
lattice codes, we proceed to show how proper utilization of 
lattice sphere decoding and LR techniques can indeed reduce 
the complexity exponent to zero, at an error-performance cost 
that vanishes in the high SNR limit. 

III. LR- AIDED Regularized Lattice Sphere 
Decoding Complexity 

Lattice reduction techniques have been typically used in the 
MIMO setting to improve the error performance of suboptimal 
decoders (cf. Qll, Qll, see also ||20l, ED). In the current 
setting the LR algorithm, which is employed at the receiver 
after the action of MMSE preprocessing, modifies the search 
of the MMSE-preprocessed lattice decoder, from 

Srid = argmiii |jr-Rs||^ 
(cf. dnll), to the new 



sir-rid = arg min ||r - RTs 

sez-^ 



-l|2 



(26) 



by accepting as input the MMSE-preprocessed lattice genera- 
tor matrix R, and producing as output the matrix T e Z**^*^ 
which is unimodular meaning that it has integer coefficients 
and unit-norm determinant, and which is designed so that 



RT is (loosely speaking) more orthogonal than R. As a 
result of this unimodularity, we have that T^^Z" = Z", 
and consequently the new search in (|26] | corresponds to yet 
another lattice decoder, referred to as the LR-aided MMSE- 
preprocessed lattice decoder, which operates over a generally 
better conditioned channel matrix RT. 

Finally with sphere decoding in mind, the LR algorithm 
is followed by the QR decompositiorliJ of the new lattice- 
reduced MMSE-preprocessed matrix RT, resulting in a new 
upper-triangular model 



Rs 



(27) 



and in the new LR-aided MMSE-preprocessed lattice search, 
which accepts the application of the sphere decoder, and which 
takes the form 



^Ir — rld 



arg mm 

seZ" 



r Rs 



(28) 



where QR = RT corresponds to the QR-decomposition of 
RT, where R is upper-triangular, where f = Q^r, s = T^^s, 
and where w" = Q^w'. 
At the very end. 



^Ir—rld 



(29) 



allows for calculation of the estimate of the transmitted symbol 
vector s in ( fTOl i. 

We note here that this (exact) solution of the LR-aided 
MMSE-preprocessed lattice decoder defined by ( l28T l, ( |29] ), 
is identical to the exact solution of the MMSE-preprocessed 
lattice decoder given by ([TtI i. because 

|2 



mill llr — Rsl 
sez" 



min r-RTT"^s|| 
seZ" " 



{a) 

(fc) 



min 

sGZ'* 


r 


QRT 


's 


min 

sEZ"^ 


r 


RT ig 


2 


min 

sGT-iZ" 


r-Rs 


2 


min 

sGZ'^ 


r 


Rs 


2 





(30) 



where (a) follows from the fact that QR — RT, (6) follows 
from the rotational invariance of the Euclidean norm, and (c) 
follows from the fact that T'^Z" = Z'^. 

While though the two lattice decoding solutions (with and 
without LR) provide identical error performance in the setting 
of exact implementations, we proceed to show that, in terms 
of complexity, lattice reduction techniques, and specifically 
a proper utilization of the LLL algorithm 1221 . can provide 
dramatic improvements. 

A. Complexity of the LR-aided regularized lattice sphere de- 
coder 

We are here interested in establishing the complexity of 
the LR-aided regularized lattice sphere decoder. Given that 

"a more proper statement would be that the QR decomposition is 
performed by the LR algoiithm it self. 



the costs of implementing MMSE preprocessing and of im- 
plementing the linear transformation in ( |29] | are negligible 
in the scale of interesO we limit our focus on establishing 
the cost of lattice reduction, and then the cost of the SD 
implementation of the search in ( |28] |. Starting with the SD 
complexity, as in (|20] |. we identify the corresponding unpruned 
set at layer k to be 



A4-{ifceZ^ I ||rfe-RfcSfc||2<e2}, 



(31) 



and in bounding the size of the above, we first focus on 
understanding the statistical behavior of the k x k lower- 
right submatrices R^ of matrix R (fc = I,--- ,k), where 
we recall that R is the upper triangular code-channel matrix, 
after MMSE preprocessing and LLL lattice reduction. Towards 
this, and for dL{r — e) denoting the diversity gain of the 
exact implementation of the regularized lattice decoder at 
multiplexing gain r — e, we have the following lemma on 
the smallest singular value of Rfc. The proof appears in 
Appendix ICJ 

Lemma 1: The smallest singular value crmi„(R.fc) of sub- 
matrix Rfc, fc = 1, • • • , K, satisfies 



P (crrmn(Rfc) 



<p' 



< p-di^(-^-^)^ for all r> e> 0. 



(32) 



To bound the cardinality Nk of Nk (cf. (EB), and eventually 
the total number Nsd = Y^k=i ^k of lattice points visited 
by the SD, we proceed along the lines of the work in 
([T], making the proper modifications to account for MMSE 
preprocessing, for the removal of the bounding region, and 
for lattice reduction. 

Towards this we see that, after removing the boundary 
constraint. Lemma 1 in IT] tells us that 



k r 



Nk^\Mk\<\{ 



\/fc- 



2^ 



0-j(R./c) 



where 



cr,„i„(Rfc) = 0-1 (Rfc) < • • • < crfe(Rfc) 
are the singular values of Rfc. Consequently we have that 

2^ 



Nk 



< 



-ik 



Vk- 



.(Rfc 



As a result, for any Rfc such that 

amini'Rk) > P~ 



and given that ^ = ^/z log p for some finite z, then 

Nk<u+'-^0^y^p^. 



(33) 



(34) 



(35) 



'^Even though the work here focuses on decoding, we can also quickly state 
the obvious fact that the cost of constructing the codewords is also negligible 
in the scale of interest because it again only involves a finite-dimensional 
linear transformation (cf. ^6)). 



which guarantees that the total number of visited lattice points 
is upper bounded as 



NsD^Y.^'^^Y.p 



^ = p'^. 



k=l 



fc=l 



Consequently, directly from Lemma [U we have that 



(36) 



(37) 



A similar approach deals with the complexity of the LLL algo- 
rithm, which is known (cf. 12311 ) to be generally unbounded. 
Specifically drawing from JS] Lemma 2], under the natural 
assumption of power-limited channelo (cf- ID), under the 
natural assumption that d^lr — e) > dL{r) for all e > 0, 
and for Nlb. denoting the number of flops spent by the LLL 
algorithm, one can readily conclude that 



P{NLB,>jlogp) <p-dL{r-.)^ 



(38) 



for any 7 > ^{dL{r—e)). Consequently the overall complexity 

N ^ Nsd + Nlr, 

in flops, for the LR-aided MMSE preprocessed lattice sphere 
decoder, satisfies the following 

V{N>p''^) = V{{Nsd~>p'^}^{Nlb>p'^}) 

< p-'^^'^'-''\ (39) 

Now going back to (|5]l, and having in mind appropriate timeout 
policies that bound Nnm^ while at the same time specifically 
guarantee a vanishing error performance gap to the exact 
solution of regularized lattice decoding, we can see that the 
complexity exponent c{r) takes the equivalent form recently 
introduced (for the ML case) in fl] 

c{r) = inf{a; | - lim ^ ) " ^ ^ > di(r)}. (40) 

p^oo log/9 

To see this we quickly note that for iVmax = P^ where 
X = c{r) — 5 for any (5 > 0, it is the case that (cf. (|9]l) 

Finally applying ( |39] l we see that for any positive ei < e, 
it is the case that 

\ogV {N> p^'^+'i) 
c(r) = inf{e | - lim ^ \ ~ > diir)} (41) 

p^oo log/9 

which vanishes arbitrarily close to zero, resulting in a zero 
complexity exponent. 

What remains is to consider the error-performance gap in 
the presence the LR-aided regularized lattice SD with a time- 
out policy that interrupts at A^max = p^ for any vanishingly 
small X > 0. 

"This is a moderate assumption that asks that E {||JJ"||p} < p. We note 
that this holds true for any telecommunications setting. 



B. Gap to the exact solution of MMSE-preprocessed lattice 
decoding 

We here prove that the LR-aided regularized lattice sphere 
decoder and the associated time-out policies that guarantee a 
vanishing complexity exponent, also guarantee a vanishing gap 
to the error performance of the exact lattice decoding imple- 
mentation. This result is motivated by potentially exponential 
gaps in the performance of other DMT optimal decoders (cf. 
ID), where these gaps may grow exponentially up to 2^^ (cf. 
Il24l ) or may potentially be unbounded jZSl . 

Towards establishing this gap, we recall that the exact 
MMSE-preprocessed lattice decoder in ( fTsT l makes errors 
when Sr-id 7^ s. On the other hand the LLL-reduced MMSE- 
preprocessed lattice sphere decoder with run-time constraints, 
in addition to making the same errors {Sr-ir-id 7^ s), also 
makes errors when the run-time limit of p^ flops becomes 
active, i.e., when N > p^, as well as when a small search 
radius causes A/^ = 0. Consequently the corresponding 
performance gap to the exact regularized decoder, takes the 
form 

, , ,. P {{Sr-lr-ld ^ S} U {N > p^} U {M^ ^ ID}) 

9l{x) = hm —J- — — . 

To bound the above gap, we apply the union bound and the 
fact that 

p(AA. = 0)<p(iiw"ii>e) 



to get that 

9l(.x) 



< 



lini 

p— >oo 



P (§r- 



-Ir-ld 



^s) 



P(A^>p^) 



P (Sr-ld ^ S) 



lim 

p-yoo P [Sr-ld T S) 



lim 



p(iiw"ii>e) 



P^oo P (Sr_/d 7^ S) ' 

Furthermore from ( [30l l we observe that 

P {Sr-lr-ld 7^ S) = P {Sr-ld ^ s) . 

and from ( [39l ) we recall that 



(42) 



(43) 



P {N>p''^) < p 



^dL{r-e) 



which implies that for any a; > it holds that 

P (iV > p^) 



lira 



0. 



(44) 



p->oo P (Sr-ld 7^ s) 

Finally the last term in ( |42] | relates to the search radius ^, and 
to the behavior of the noise w which was shown in ( fTSI l, 
(l27T i to take the form 

w" = Q^ {^alR-"s + R-^M^w) . (45) 

The following lemma, whose proof is found in Appendix iDl 
accounts for the fact that w includes self-interference and 
colored noise, to bound the last term in ( |42] |. 

Lemma 2: There exist a finite z > di, [r) for which a search 
radius f = ^z log p guarantees that 

P(l|w"||>0 



lim 

p->oo P [Sr-ld T s) 



(46) 



Consequently combining ( |43] l, (|44] | and ( |46] l gives that 
gh{'x) = 1, Vx > 0. The following directly holds. 



Theorem 2: LR-aided MMSE-preprocessed lattice sphere 
decoding with a computational constraint activated at p^ flops, 
allows for a vanishing gap to the exact solution of MMSE- 
preprocessed lattice decoding, for any x > 0. Equivalently the 
same LR-aided decoder guarantees that 

9L(i-) = 1 and lim = Ve > 0, g > 1, 

p-i-oo log p 

for all fading statistics, all MIMO scenarios, and all full-rate 
lattice codes. 



IV. Conclusions 

The work identified the first lattice decoding solution that 
achieves, in the most general outage-limited MIMO setting 
and the high rate and high SNR limit, both a vanishing gap 
to the error-performance of the (DMT optimal) exact solution 
of preprocessed lattice decoding, as well as a computational 
complexity that is subexponential in the number of codeword 
bits. The proposed solution employs lattice reduction (LR)- 
aided regularized lattice sphere decoding and proper timeout 
policies. As it turns out, lattice reduction is a special ingre- 
dient that allows for complexity reductions; a role that was 
rigorously demonstrated here for the first time, by proving 
that without lattice reduction, for most common codes, the 
complexity cost for asymptotically optimal regularized lattice 
sphere decoding is exponential in the number of codeword 
bits, and in many cases it in fact matches the complexity cost 
of ML sphere decoding. 

In light of the fact that, prior to this work, a vanishing error 
performance gap was generally attributed only to near-full lat- 
tice searches that have exponential complexity, in conjunction 
with the fact that subexponential complexity was generally 
attributed to early-terminated (linear) solutions which have 
though a performance gap that can be up to exponential in 
dimension and/or rate, the work constitutes the first proof 
that subexponential complexity need not come at the cost of 
exponential reductions in lattice decoding error performance. 

Appendix A 
Proof for Theorem[I]and Corollary FTaI 

In the following we begin by providing an upper bound 
on the complexity exponent of MMSE-preprocessed (uncon- 
strained) lattice sphere decoding, where this bound holds for 
the general quasi-static MIMO channel, for all fading statistics 
and for any full-rate lattice code. We will then proceed to 
provide a lower bound on the complexity exponent of the same 
decoder, where this bound, under the extra assumptions of 
regular i.i.d. fading statistics and of layered codes, will in fact 
match the above mentioned upper bound to prove the theorem 
and the associated corollaries. Before proceeding with the 
bounds, we describe the tit x tib, (nu > nx) quasi-static point- 
to-point MIMO channel, and its corresponding association to 
the general MIMO channel model in ( fTOb and metric in ( fTTI ). 

The aforementioned quasi-static channel model takes the 
form 



Yc = V^cXc 



Wr 



(47) 



where Xc G C"^^^, Yc e C"«'''^ and Wc G C""^"^ 
represent the transmitted, received and noise signals over a 
period of T time slots, and where He G C"r^"t represents 
the matrix of fade coefficients. The real- valued representation 
of ( |47] | can be written as 



MMSE-preprocessed lattice sphere decoder, is upper bounded 

as 



fc 



Nk = \Mk\ < n 



2k 



2^ 



o'i(Rfc 



(57) 



Y;, = Vi^HflX, 



Wr 



(48) 



where Yt 



-S{Yc} 



3{Hc} 5R{Hc} 



5{Yc} 5R{Yc} 

5i{Xc} 



H 



R — 



where ai{Ilk), i = 1, • • • ,k denote the singular values of R^. 
in increasing order 

Towards lower bounding (Ti(Rk), we note that 



, X, 



3{Xc} 
3{Xc} 5R{Xc} 



<7,(Rfc)>a,(R)=a,(M'^'^ 



v2+a,(M^M), (58) 



and W/f = 



5R{Wc} -3?{Wc} 
3{Wc} 3f?{Wc} 



, and subsequent 



where the first inequality makes use of the interlacing property 
of singular values of sub-matrices 1261 . Furthermore for 



vectorization gives the real-valued model 

y = Vp(It ® Hfljx + w 



(49) 



^ loga,-(HgHc) . ^ 
log/9 



(59) 



where y = wec(Yfl), x = wec(Xj^), and w = wec(Wi^). The 
system model in i49[ is of the familiar form 

y = VpHx + w (50) 

as in ([T]i with m = 2nTT, n ^ 2njiT, and where 

H = It®H;v^. (51) 

As before the vectorized codewords x, associated to the full- 
rate code, take the form 



"Gs, seZT\p—n, 



(52) 



where k = 2min{nT, np{}T = 2nTT = m, which allows us 
to rewrite the model as 



y = Ms + wr, 



for 



M 



HG 



^(l2 



H„)G. 



(53) 



(54) 



Finally the corresponding coherent MMSE-preprocessed lat- 
tice decoder for the transmitted symbol vector s, can be 
expressed to be (cf. ( fTTI i) 



Sr-id = argmm r - 
sez" 



Rslp. 



(55) 



where r = Q^y and R G C'*^'* is the upper-triangular matrix, 
where furthermore both Qi and R result from the thin QR 
decomposition of the (ri + k) x k dimensional preprocessed 
channel matrix 



and where as before a 



M 



= QR = 



Qi 

Q2 



R 



(56) 



p » 



A. Upper bound on complexity of regularized lattice SD 

In establishing the upper bound, we consider Lemma 1 in 
IT], which we properly modify to account for MMSE prepro- 
cessing and for the removal of the constellation boundaries, 
and get that the number Nk of nodes visited at layer k by the 



and /ii > • • • > /iny, we see that (Tj(Hc) = p ^''j, and from 
that 

CT,(M) > p^-'^CTn,in(G)CT(,)(lT Hfl)) 

1 rT 

= P^~~cr;2T(»)(Hc) 



2T{i)\ 



(60) 



where Irii) — [5^], and where the asymptotic equality is due 
to the fact that crmin(G) = p". Substituting from (|60]l in (|58] ) 
we now have that 



Corresponding to ( |57] i we see that 



(61) 



2k 



2e 



< 



,(^-5(l-Mi2TW) + ) 



+\+ 



for any z = 1, • • • , 2nTT, and from ( l57l ) we have that 

Nk{^J■) < p^^=i ('^-5(i-a'<2tw)+)^^ (62) 

where fi = {pi, ■ ■ ■ , Pm-)- It follows that 

fc=i 

= P" 



fe=l 

,+ 



where the last asymptotic equality is due to the multiplicity 
of the singular values. 
Now consider the set 



T{x) = I M I t£ (^-^ - (1 - M,)+) ^ >A 



(64) 



and note that for any y < x, then ( |63] | and fi ^ T{y) 
jointly imply that Nsd < P^, which in turn implies that 
P (/x ^ T{y)) < P {Nsd < p"") and consequently that 



j.^ logP(iVsi,>p") ^ _ ^.^ logP (/x G r(y)) 



p^'OO 



logp 



p— J-CX3 



logp 



(65) 



In evaluating the right hand side of ( |65] ) we note that T{y) is a 
closed set and thus, applying the large deviation principle (cf. 
1271 ). we have that 

logP(/xer(y)) 



lini 



logp 



> inf I{^l) (66) 



for some rate function /(/x). Consequently from ( |65] l and 
it follows that 

- lim '^^ny^P') > i,f ,(^). (67) 

This lower bound specified in ( |67] | holds for any y < x. 
Consequently to get the tightest possible bound, we need 
to find sup,y<.j,inf^g7-(j^)/(/2). As ini ^^r{y) I ifJ') is non- 
decreasing and left-continuous in y, it follows that 

sup inf /(/x) = inf /(/x). 

y<xtJ-eT(y) tJ.€T(x) 

Consequently 

p^oo iogp fj.eT{x) 

which in conjunction with ( |40] i gives that 

Cr-id(?') < Cr-;d(?') = inf{a;| inf I{n) > dL{r)) 

iJ,eT(x) 

= sup{a;| inf /(/i) < dL{r)} 
neT{x) 

= max{x| inf /(/x) < ^lW} (69) 

where the above follows from the aforementioned fact that 

_ Yon. '°gP(^sD>p") 



p— foo 



logp 



(and by extension also iid^^-Y(x) I{lA) 
is continuous and nondecreasing in x, and from the fact that 
T{x) is a closed set. Consequently Cr-idir) takes the form 



Cr-idir) = max x 



(70a) 



"T / \ -I- 

s.t. TY.i^-i'^-f^jy] >x, (70b) 

/(M)<rfLW, (70c) 

Ml > • • • > MuT > 0. (70d) 

Furthermore since T{x) is a closed set, the maximum x in 
(ITOI i must be such that ( I70bl i is satisfied with equality, in which 
case Cr-idir) can be obtained as the solution to a constrained 
maximization problem according to 

Cr-id{r) 4 max rV — - (1 - Mj)+ (71a) 

S.t. /(/x) < di(r), (71b) 

Ml > • • • > M«T > 0. (71c) 

Equivalently for u* = {^\ ,■■■ , M*!^ ) being one of the 
maximizing vectora'l. i.e., such that fi* e T{x) and /(/x*) = 
dL{r), then Cr~id{f) takes the form 



Cr^idir) ^ Ty^ 



i=i 



riT^ 



(l-/i*)" 



(72) 



'''in general, 17 U does not have a unique optimal point because (a)"*" is 
constant in a for a < 0. 



As we will now show, the above bound is also shared by the 
ML-based sphere decoder, with or without MMSE preprocess- 
ing, irrespective of the full-rate code and the fading statistics. 
Directly from IT] Theorem 2], and taking into consideration 
that MMSE-preprocessed lattice decoding is DMT optimal for 
any code 0, we recall that the equivalent upper bound for 
the ML-based sphere decoder, without MMSE preprocessing, 
takes the form 



"■1 / \ -r 

Crai {r) = max tV" min ( 1 + Mj , — ) (73a) 

M "^-^ \nT TlT J 

j=i 

s.t. /(/x)<dL(r), (73b) 

Ml > • • • > MnT > 0. (73c) 

Comparing (fTTT i and (l73T l we are able to conclude that both 
the objective functions ( 171 at and ( 17 3 at as well as both pairs 
of constraints are identical. To see this, we first note that for 
< Mi < 1' then 

+ 



r r 

mm 1 + M71 — 

' riT nT 






1 + Mj 

riT 



1 + Mj 



and furthermore we note that for /ij > 1, then 



r r \ 

mm 1 + M77 — 



= (--(l-M,)+ =-, 



which proves that Cmi{r) and Cr-id{r) are identical. 

In considering the case of MMSE-preprocessed ML 
SD, it is easy to see that the summands in the objec- 
tive function in ( I73al ) will be modified to take the form 

min (^^ (1 — Mj)^i ;f~ ) which can be seen to match 

(171 al l for all fij > 0, which in turn concludes the proof that the 
upper bound Cr-idir) for MMSE-preprocessed lattice SD is 
also shared by the ML-based sphere decoder, with or without 
MMSE preprocessing, irrespective of the full-rate code, and 
for all fade statistics represented by monotonic rate functions. 

B. Lower bound on complexity of regularized lattice SD 

We will here, under the extra assumptions of regular i.i.d. 
fading statistics and of layered codes with natural decoding 
order, provide a lower bound that matches the upper bound 
in ( l72b . The same bound and tightness will also apply to any 
full-rate code, under the assumption of a fixed, worst case 
decoding ordering. 

The goal here is to show that at layer k = 2qT, for some 
q G [Ij^t], the sphere decoder visits close to p''r-'d(^') nodes 
with a probability that is large compared to the probability 
of decoding error P (s^ 7^ s) = p^^^ '■'•', which from the 
expression of the complexity exponent ( l40l i. will prove that 

Cr-ld{r) =Cr-ld{r). 

Going back to ( |72] |. we let q be the largest integer for which 



— -(i-m;)+>o, 



(74) 



in which case 



takes the form 



Since 



Cr-id{r) = T 



g 

■sr-^ r 






TlT 



(1-M-r 



(75) 



We recall from ^ that ^^ = - ^^^^^isff^^ ' J = 
1, • • • ,TiT, and that fi* G T(a;) satisfies I{fJ.*) = dL(f) and 
maximizes ( 17 lab . We also note that without loss of generality 
we can assume that g > 1 as otherwise Cr-id{r) ~ (cf. 
(|72]|). Consequently it is the case that /i* > for j = 1, • • • , g. 
Furthermore given the monotonicity of the rate function /(/i.), 
and the fact that the objective function in (ItTI i does not increase 
in /ij beyond /ij = 1, we may also assume without loss of 
generality that [i* <\ for j = 1, • • • , tt-t- 

As in [11 we proceed to define two events J7i and O2 which 
we will prove to be jointly sufficient so that, at layer fc = 2qT, 
the sphere decoder visits close to p'^^-^idir) nodes. These are 
given by 



HfHfl = V(diag{ai(HfHH),--- ,'T2„,(HgHfl,)})V^ 
= V(diag{ai(HfHH),---,'T2„,(HfH«)} 
- a(2,+i)(Hf Hfi)diag{0^_^, V^})V^ 

2(j 2p 

+ a(2,+i) (Hf H;v,)V(diag{0, • • • , 0, 1, • • • , 1})V^, 

2q 2p 

we have that 

Hf Hfl ^ a(2,+i) (HgHfl)V(diag{0, • • • , 0, 1, • • • , 1})V^ 



2g 



2p 



a(2,+i)(HfHfl)V(diag{0,--- ,0,1,--- ,1}) 



2q 



2p 



n, ^ {^i* -26< /i, <^J^-s,J^l,■■■ ,q 

< fij <S,j = q + l,--- ,nT}, 
for a given small (5 > 0, and 

02^{ai((lT®Vf)G|p)>u}, 



(76) 



(77) 



for some given m > 0, where for p ~ nx — q then G|p 
denotes the first 2pT columns of G, and where Vp denotes 
the last 2p columns of V obtained by applying the singular 
value decomposition on Hu, i.e., Hr = USV^, where 

S = diag{cri(Hi^),--- , <T2nT {'^r)} 

with 0-1 (Hi?) < • • • < o-2nT(HH) and VV^ = I. Hence, V^ 
corresponds to the 2p largest singular values of H^. 

Note also that by choosing S sufficiently small, and using 
the fact that fi* > for i = 1, • • • ,q, we may without loss 
of generality assume that 17 1 implies that /ij > for all j = 
I,--- ,nT- 

Modifying the approach in U^, Theorem 1] to account for 
MMSE preprocessing and unconstrained decoding, the lower 
bound on the number of nodes visited at layer k by the sphere 
decoder, is given by 



(diag{0^_^,V_^})V^ 

2g 2p 

= 0-(2g+l)(Hj,.H_R)VpVp 

where the last equality follows from the fact that Vp contains 
the last 2p columns of V and where A ^ B denotes that 
A— B is positive-semidefinite. Since o'i(H^H) e K and since 
the Kronecker product induces singular value multiplicity, it 
follows that 



^ \P ' \P 



y 



A-- 



P' » a(2,+i)(H^H^)G^(lT®VpV-)G|p + a^I. 

rec 



p 



With respect to the smallest singular value of (M[^^)^M'^'^^ 
we have 



0-1 



(Gg(lT®VpVf)G| 



and consequently, given that Hji G il2, we have that 



ai(M[;^)>p-^ 



P " P 



'pf^/.(29+i)(HgHc) + l 



>P 



-r^ + Ul-S)+ 



(79) 



iv.>n 



2e 



VfccTl (R/c 



^ 



(78) 



In the following, and up until (iMt . we will work towards upper 
bounding tTi(Rfe) so that we can then lower bound N^. 
Towards this let 



where the first inequality follows from dTT] ). the exponential 
equality follows from ( |59] l and from the fact that m > is 
fixed and independent of p, and the last inequality follows 
from ( l76l l. 

From (l54b we have that 



yf-9 A 
\p 



HG 



arl\p 



\p 



p2(nB+nT)Tx2pT 



CJ^{M^^') < P 
= P 






1 = 1, 



, 2nTT, (80) 



contain the first 2pT columns of M'''^^ from 
that 



and note 



(m\;Tm];^ = p' 

and that from ( fSTT l we get 



"G|pH HG|p 



ail 



where the asymptotic equality follows from the fact that 
(Ti^{G) is fixed and independent of p. Furthermore ( l76b gives 
that for i = 1, • • • , 2qT then 



^.(M'^^^) < p' 



-+5+i(i-A<r2^(,))+ 



(81) 



(M-^9^H 



'Y 



\p 



^G^(lT®HfHfl)G|„ + a2i. 



where we have made use of the fact that /i* < 1 for j 
I,-- • ,nT. 



Given that ^* > for j = 1, • • • ,q, then for sufficiently 
small 5 and for z = 1, • • • , 2qT, we have that 

rT \ rT \ 

which means that for sufficiently small 5, a comparison of ( |79] l 
and dSTT ) yields 

ai(M-s)<ai(M[;'?), 

for i := 1, • • • , 2qT. The above inequality allows us to apply 
Lemma 3 in [I], which in turn gives that 



Toward this we note that as ( |76] l and ( TTTT i imply that Nsd > 

^c,-M(r-)-3gT5^ it folloWS that 



P UVSD > P' 



CT-ld{r)-MTS 



>P(f^inf^2)=P(^l)P(^2) 



o-i(R-fe) < 



(T.(M'-'^9) 

^i(M[;^) 



^,(M'-^s), 



(82) 



for i = I,--- ,2<?T. 

Setting i = K in dSOl l upper bounds the maximum singular 
value of M'''^^ as 



where the equaUty follows from the i.i.d. assumption on the 
entries in He, which makes the singular values of H^Hp 
independent of the singular vectors of H^Hp ll28l . 1291 . and 
which in turn also implies independence of the singular values 
of HpH(7 (event ili) from the singular vectors of H^Hj^ 
(event J72)- 

We now turn to H] Lemma 2] and recall that for the layered 
codes assumed here, as well as for any full-rate design and 
some non-random fixed decoding ordering (corresponding to a 
permutation of the columns of G), there exists a unitary matrix 
Wp such that rank('(lT ® (%')^)G|p] = 2pT i.e., that 

'\H\ 



cr^(M'^'=9) < p-if + 3(1-M"t)+ < p-2 



ai (It®(%)^)G|J>0 



(83) 



where the last inequality is due to the fact that jij > 0. 
Consequently combining (l83T l and ( |79] l gives that 






< p^^ 



which together with (ISTI i and (|82] i gives that 

ai(Rfc) <p"'^+5^+^^'"^'^^'"^^, i = l,---,2qT. (84) 
Consequently, going back to ( fTSl l. we have that 



However, by continuity of singular values 11261 it follows for 
sufficiently small u > (cf.^1}) that P (0,2) > 0, which 
implieo that P {fl2) == p^ as ^2 is independent of p. This in 
turn implies that 

P (Nsd > p^'-"('-)-3«^*'') > P (r^i) . (91) 

With rii being an open set, we have that 

P(^i) 



lim 

p^oo log p 



< inf /(/x), 



2C 



Vka^CRk 



~Vk 



> 



,(^-l*--^(i-Mr,,w)+) 



>0 (85) 



9 

E 



(|nT-nfl|+2j-l)(Ai*-25), 



1^ 



and furthermore for i = 1, • • • , 2(7T, we have that — 

i(l — /Xj*^ (i))^ -^ directly from definition of q and for 

sufficiently small 6. As a result, for k < 2qT we have that 



Nk>Y[p 



(^-i^-^(i-M;,,(„)) 



4=1 



,Eti(^-^(i-ft,,w)^)-|^* 



(86) 
(87) 



= dL{r) - 2{\nT - urI + q)qS, 

< dL{r), (92) 

where the above follows from the monotonicity of the rate 
function 



i=i 



evaluated at 



and setting k = 2(7T we have that 

,(TE|=i(^-(l-M*)^)-3«T5) 



K-2(5--- ,aC-2(5,0,--- ,0} = arg inf /(/x) 



MSf^l 



A^2,T > P^"^ (^-^(l-A'r,.(.,) + )-3.T.) (gg^ 



(c,._,d(r)-3gT5) 



P 



where the last equality follows from 



(89) 
(90) 



and'1 also follows from the fact that, by definition, /(/x*) 
dL{r). 

Consequently from ( |9T1 ) we have that 



Consequently 



— lim 



P(NsD> P' 



;Cr-id(.r}-3qTS\ 






Nsn > NonT > p^"-^^(r)~3qTS 



\ogp 



< dUr), (93) 



V29T 



for small (5 > 0. Given that S can be chosen arbitrarily small, 
and given that events i7i and il2 occur, then the number of 
nodes visited by the SD at layer 2qT is arbitrarily close to the 
upper bound of p^'--"('') 

Now to show that Cr-id{r) > Cr-id{r) — 3qTS, we just have 

ir)-3qTS\ 



to prove that — lim 



P[NsD> P"" 



logp 



< dL{r). 



and directly from the definition of the complexity exponent, 
we have that Cr-id{r) > Cr-id{r) — 3qTd. As the bound holds 

'^In light of the fact that event V has zero measure, what the continuity 
of eigenvalues guarantees is that we can construct a neighborhood of matrices 
around \^ which are full rank, and which have a non zero measure. We also 
note that the matrices V, can be created recursively, starting from a single 
matrix V^^ . 

'^Recall that parameter t was previously introduced as a parameter that 
regulates the near zero behavior of the random variable. 



for arbitrarily small 5 > 0, it follows that Cr-id{r) = Cr-id{r)- 
Directly from lU Theorem 4] which analyzes the ML-based 
complexity exponent Cmi(r), together with the fact that the 
ML-based sphere decoder, with or without MMSE prepro- 
cessing, shares the same upper bound Cr-id{r) as the MMSE- 
preprocessed lattice decoder, gives that Cmi{r) = Cr-idir), 
which in turns implies that 

Cr-ld{r) =c,„;(r). 

This establishes Theorem [T] and Corollary [Ta] D 

Appendix B 
Proof for Corollaries FTbIandFTc] 



Section lA-AI shows that Cr-id{r) can be obtained as the 
solution to the constrained maximization problem 



Cr-ld 



s.t. /(/i) < di(r), 

Ml > • • • > MriT > 0- 



(94a) 
(94b) 



In some cases though, further knowledge of the error perfor- 
mance of the encoder and decoder, can result in an explicit 
characterization of the complexity exponent. Take for instance 
the case of DMT optimal encoding IB], HSI and DMT 
optimal MMSE-preprocessed lattice decoding lID, JS], where 
the constraint /(//.) < Al^t) in (|94a| l reverts to the constraint 
X]?=i(l ~ P'i)^ — '^ ('-f- ©)' which may be recognized 
to correspond to the no-outage region (cf. [3l). In this case 
Cr-idir) can then be explicitly obtained from the optimization 
problem 



^ f r 

Cr^id{r) = max T} (1 

M ^ — ' \ riT 

.7 = 1 



Mi J 



3 = 

s.t. ^(l-M,)+> 

J = l 



(95a) 



(95b) 



Ml > •••■ > M«T > 0, (95c) 

which can be solved in a straightforward manner to give that 

T 
Cr-id{r) = — (r{nT - H ^ 1) + ("-T L''J - r{nT - 1))^) , 

describing the upper bound on the complexity exponent for 
MMSE-preprocessed lattice sphere decoding of DMT optimal 
full-rate codes, which for minimum delay (tit = T) DMT 
optimal full-rate codes takes the form 

Cr-idir) = r{nT - [r\ - 1) + {nr \r\ - r{nT - 1))^, 

(96) 

and which further simplifies to 

Cr-id{r) = r{nT ~ r), 

for integer multiplexing gains ?■ = 0, 1, • • • , tt-t- In conjunction 
with the lower bound in Section lA-BI under the conditions lay- 
ered codes in Corollary [Tbl we have that Cr-id{r) ~ Cr-idif), 
which proves Corollary [lb] D 



Moving on to the universal upper bound, we can see from 
(TtTI i that, regardless of the fading statistics and the correspond- 
ing /(/x), the exponent c,._id{r) is non-decreasing in ^^(r) 
and is hence maximized when dL{r) is itself maximized, i.e., 
it is maximized in the presence of DMT optimal encoding 
and decoding. Combined with the fact that the corresponding 
maximization problem in ( |95] | does not depend on the fading 
distribution, other than the natural fact that its tail must 
vanish exponentially fast, results in the fact that, for any full- 
rate code and statistical characterization of the channel, the 
complexity of MMSE-preprocessed lattice SD is universally 
upper bounded as (cf. |ij) 

— (r(7iT - W - 1) + [riT W - rinr ~ 1))+) . (97) 
This proves Corollary [Tc] D 

Appendix C 

Proof for Lemma[T] 

For R^R^ = Mf M^ + a^i (cf. (Ofzl it follows by 

the bounded orthogonality defect of LLL reduced bases that 

there is a constant A'^ > independent of R^ and p, for 

which (cf. IIS and the proof in 1^) 

ffma.(R."') < T^ (98) 



where 



A(Rr) 



A(R,.) = min lIRrcl 
cez»\o 



(99) 



denotes the shortest vector in the lattice generated by Rr. As 
a result we have that 



(Rr)> 



A(R^) 



(100) 



Looking to lower bound ^^^(Rr), we seek a bound on 
A(Rr). Towards this let r' = r — 7 for some r > 7 > 0, 
in which case for s being the transmitted symbol vector, and 
for any § e Z'^ such that s 7^ s, it follows that 

||r-Rr'SJ| = ||(r- R,./s) + Rr/(s -s)|| 

< ||(r-R^-s)|| + ||R,,-(s-§)|| (101) 



and 



lR^,(s- 



> ||r-R,.,si| - j|(r-R^,s)|l 

= ||r-R^,sj| - ||w||. (102) 



From ( IIO2I 1 it is clear that to find a lower bound on A(Rr'); 
we need to lower bound ||r — Rr's|| for all s e Z" and upper 
bound ||w||. Let us, for now, assume that ||w|| < p''. To lower 
bound II r — Rr's||, we draw from the equivalence of MMSE 
preprocessing and the regularized metric (cf. equation (45) in 
ID), and rewrite 



|r-R,,s||' = ||y-M,.§||' 



l§ll'-c, 



(103) 



where c = y^[I - M^(M^M^- + a2,i)-i]V[^,]y > 0. We 
now note that for i = s then ||y — Mr/s|| + a^, ||s|| < p^, 

'^Note the transition to the notation reflecting the dependence of R on r. 



and since the left hand side of ( 11031 ) cannot be negative, and 
furthermore given that c is independent of s, we conclude that 



On the other hand if § ^ p s B, then by definition of B 
we have that af, ||sj| > iP^p"^, and consequently that 



-l|2 



||y-M,„si|Va?,||sf >irV'^. 



We will now proceed to lower bound ||y — M^'S 

a^, ||s|l and then use ( I103l l to lower bound ||r — R^'SJI 

Towards lower bounding ||y - M^'s||^ + a^, ||s||^ we draw From ^W} and ([T08]l we then conclude that 



from Theorem 1 in 
given by 

B^{de 



and we let B be the spherical region 



|y-M,,s|r+a,l||s|r >p" 



(108) 



(109) 



<r2} 



where the radius F > is independent of p and is chosen so 
that di +d2 E TZ for any di, d2 E B. The existence of the set 
B follows by the assumption that is contained in the interior 
of n. Now let 

and for given j > ( > choose 6 > such that 

K 

This may clearly be done for arbitrary C, > 0. We will in the 
following temporarily assume that i^^'+c ^ 1 ^nd prove that, 
together with ||w|| < p^, the two conditions are sufficient for 
A(Rr') > p^ to hold. 

In order to bound the metric for s G Z'^ where § 7^ s, we 
note that i^r'+c > 1 implies that Vd G p " ^^ ^ S n Z", d 7^ 
it is the case that 



Given (1107b and (1109b . for any s g Z'' such that i 7^ s, it is the 
case that ||y — M^'SJI + of., ||s|| > p~^, which combined 
with c < p'' allows for ( 11031 ) to give that 

II ,-. »||9 ■ 2iZ 
||r — R.r/s|f > p " . 

Applying (|99] l and ( 1102b . we have 



(110) 



A(R,. 



> ||r-R.r's|| 

> p - - p^ 
= p « 



(111) 



where the exponential inequality follows from dllOb . Further- 
more we know that 



A(R^) ^ p-^X{Rr') >p~ 



(112) 



1 

4 

jr'+QT 



M^,+^d|| 



HGd 



HGd 



> 

(a) 
> 



> 



where 6 = 7 — <^, r>e>0, and from dlOOb and (1112b it 
follows that cr„ii„(Rr) > p~^ . 

We now note that the above implies that for i^r'+r ^ 1 ^nd 
||w|| < p then cr„,i„(Rr) > /0~^, and thus applying the 
union bound yields 

P (fT„„„(R,) < p"^) = P (^{Ur'+C < 1) U dlwil' > p^) 



where (a) follows from the fact that M,. = ps 
Consequently 



HG. 



i||M^.di|^ >p^, Vdep^'^^^SnZ'',d7^0. (104) 

As TZ is bounded, and as (^ > 0, it holds that TZ C ^p~^B 
for all p > pi, for a sufficiently large pi. This implies that 



< p(z/,/+c < i) + p(^iiwir >p' 

We know from the exponential tail of the Gaussian distri- 
bution that P ( ||w|| > p''] = p^°° and from Lemma 1 in IS) 

that P {i^r'+c < 1) < p-''"^ ('"'+«). Hence 

P (a™„(R,) < p^) < p-dMLir-e) 



se 2p- 



(r' + QT 



B for p > pi since s G p'^" TZ. 



For s,d G ip ^ SnZ'', there exists an s G p ^ BO 

Z*^, §7^8, such that § = d + s. Hence for any s G p -« Bf] 
Z*", we have from ( 1104b that 

i ||M,,(§ - s)f = i i|M,,.di|2 > p^. (105) 

As |jw||^ < p^ it follows that i ||Mr'd||^ > ||w||^ for large 
p, and that 

l|y 

Consequently 



for all r > e > 0. 

The association with the singular values 

CTl(Rr.fc) < ■ • • < Crfc(R-r,fc) 

is made using the interlacing property of singular values of 
sub-matrices, which gives that 



I ^y 



aiiRr.k) > crt(Rr), i < fc = 1, 
and for fc = 1 , • • • ,k, that 



(113) 



Mwsll^ 



|M^-(s-s)+wf >p^. (106) 



|y-M,,s||' 



a,l||sf >p^. 



(107) 



Finally from the DMT optimality of the exact implementation 
of the regularized lattice decoder ||6l, fS), we have that 

This proves Lemma [T]n 



Appendix D 
Proof for Lemma[2] 

For a search radius that grows as ^ = \Jz log p = p°, we 
first prove that 



for z > z' > di,(r). Towards establishing the properties of 
the equivalent noise w (cf. (|45]|). we consider an equivalent 
representation of the MMSE-preprocessed lattice decoder and 
let (cf. ill]) 



QR 



Qi 
Q2 



R 



M 



D (n + K) X K 



(114) 



be the thin QR factorization of the modified channel matrix, 
where Qi = R^^M e M"^'", Q2 = a3'^ G R"""" and 
where R^R = M^M + a^I. It then follows that for F = 
Q{^, the MMSE-preprocessed lattice decoder is equivalent to 
lattice decoding in the presence of channel R and noise 



-a:R 



-H 



s + R^-^M^w 
Qfw. 



< P(||-a,Qfs|| + ||Qfw|| >e) 



Consequently we calculate 



(115) 



(^) p 



- a,.Q^ 



IQ^ 



w 




>i 



< P K[\\yv\\ + sup II - a^sll) > ^ 

= P(K||w||+Ki^>0 

= P [k||w|| > (zlogp)^ - k/v 

< P k||w|| > (2ilogp)2 



(d) 



p(||w|p>iilogp 

P(||w|p > Z2logp) 



- p-^^ 



(116) 



where (a) follows from the MMSE-preprocessed equivalent 
channel representation (cf. ( 11141 )). and where the inequalities 
in (6), (c) and (d) follow for some fixed K that upper bounds 
suPsggK II — ars||, and for some arbitrary zi, 22 satisfying 
z > 2i > 22 > independent of p. Consequently 

P (l|w"|| > e) = P (lIQ^w'll > ^) < P~"' 
for some < z' < Z2, and as a result 

Pfl|w"||>el 



lini 

p— >CXJ 



P (Sr-W 7^ S) P-^ 



lim pt'^^W-^') ^ 0, 



where the last equality follows after choosing the search radius 
such that z > z' > dL{r). This proves Lemma |2] D 
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